Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] Fix invalid ref.is_null after select #1521

Closed
wants to merge 1 commit into from

Conversation

ia0
Copy link
Contributor

@ia0 ia0 commented Aug 17, 2022

select without argument produces a numeric or vector type while ref.is_null consumes a reference type. Since those types are disjoint, select followed by ref.is_null can never be well-typed.

The test is fixed by removing the select assuming the intent of the test is to check whether ref.is_null can be called after unreachable.

`select` without argument produces a numeric or vector type while `ref.is_null`
consumes a reference type. Since those types are disjoint, `select` followed by
`ref.is_null` can never be well-typed. The test is fixed by removing the
`select` assuming the intent of the test is to check whether `ref.is_null` can
be called after `unreachable`.
@conrad-watt
Copy link
Contributor

IIUC, the behaviour captured by this test is actually intended, because in syntactically dead code select (with no annotation) is allowed to propagate a "bottom type" which can subsequently be used in place of a reference type during validation.

refs:
bottom type
unreachable typing
select typing

I think I might see a drafting error in the current spec though: the result type of unreachable is stated to be an arbitrary sequence of "value types" (with this wording used in a few other places for syntactically dead code), but I think it should be an arbitrary sequence of opdtype. @rossberg, thoughts?

@ia0
Copy link
Contributor Author

ia0 commented Aug 17, 2022

I see, I was actually wondering what was the use of bottom types since they are not used at all in the spec, see this question on zulipchat. If stack-polymorphic operations would indeed produce bottom types that would make a difference (the "function type" wording of the definition of "stack-polymorphic" would need to become "stack type").

But even when fixing the bottom type issue, I still believe that chaining select and ref.is_null is a bit weird. It's not because the stack is polymorphic that select can produce a bottom type. I see this as a unification question, select has a polymorphic type ∀α (α ∈ numtype ∪ vectype) ⇒ (α × α × i32) → α and when the stack is polymorphic, we can pick any value type satisfying the constraint and produce that type. That type is still left unconstrained (it can be any numeric or vector type) but it cannot be a reference type. So (select) (i32.clz) is ok and (select) (f32.abs) is ok too. But (select) (ref.is_null) is not because there's no context in which it possibly would (assuming we don't make the exception being currently discussed).

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 17, 2022

It's not because the stack is polymorphic that select can produce a bottom type.

The way I think of this is that (riffing off your syntax) select has constraint

∀α (α ∈ numtype ∪ vectype ∪ bottype) ⇒ (α × α × i32) → α

and bottype is only available for consumption in syntactically dead code - but the reason we intended this semantics is so that the concrete type checking algorithm doesn't actually need to propagate this constraint during validation (assuming a polymorphic stack), since bottype is always the most permissive choice.

EDIT:

the "function type" wording of the definition of "stack-polymorphic" would need to become "stack type"

Yes, we may need to double-check some of the wording of the spec...

@ia0
Copy link
Contributor Author

ia0 commented Aug 17, 2022

Yes, I agree select is currently written with your definition, because α (or t in the spec) is an operand type. I just find this weird to allow patterns in dead code that would be impossible in live code. I believe select to be the only such exception.

A related question is then: why is it even necessary to validate dead code? Why not simply jump over it as long as it's syntactically correct (i.e. it was parsed by the text or binary format)? It looks to me that the stack-polymorphic notion was introduced as a way to accept otherwise valid code when validating dead code. For example, why would (select) (ref.is_null) be ok and (i32.const 0) (ref.is_null) not be ok? Both patterns cannot appear in live code, but we still accept one. I believe this to be an unintended weirdness. It would make sense for me to either accept arbitrary unreachable code or accept code that accept at least one context in which they are live and valid.

@conrad-watt
Copy link
Contributor

Why not simply jump over it as long as it's syntactically correct (i.e. it was parsed by the text or binary format)?

I believe this to be an unintended weirdness.

FWIW the current situation with dead code is a personal bugbear of mine! There are a couple of subtleties with the "skip" idea, since there's been a feeling in the group that we don't want to inadvertently create a distinction between "decode-time" vs "validation-time" errors, but I have a proposal "relaxed dead code validation" waiting in the wings that would do something similar. We've not had the bandwidth to push it forward recently, but IMHO eventually we'll bump up against a planned feature that stresses our dead code weirdness to the point that it makes sense to adopt (GC types may yet be such a feature!).

@ia0
Copy link
Contributor Author

ia0 commented Aug 17, 2022

Interesting, this proposal is similar to how I implement validation, i.e. the idea of wiping the stack and marking it as "polymorphic" (the bottom notation and unreachable boolean in the proposal). The difference with what I do is that when the stack is polymorphic, push and pop operation are not checked in the proposal, while on my side it is always possible to pop from such a stack, and you get a meta-variable that can unify with any value type (push is carried out as usual). This permits to defer the "guessing" part of polymorphic stacks to when it's needed.

There are a couple of subtleties with the "skip" idea

Yes, there are some design choices. But my understanding is that those choices should be coherent across some properties:

  • Valid syntax: if some dead code must be syntactically valid, then all dead code must.
  • Valid non-stack preconditions: if some dead code must satisfy its non-stack preconditions, then all dead code must.
  • Valid if it were live: if some dead code must valid if it were live (in an appropriate context), then all dead code must.

We can choose which properties are ignored (no dead code is required to observe the property, but it may by accident, i.e it's not a precondition) or used (all dead code must observe the property).

@ia0
Copy link
Contributor Author

ia0 commented Aug 19, 2022

Actually, another weirdness I've found (not sure if there's a test for it) is br_table. The spec says the stack only needs to match the labels. I think it should be that the stack matches a result type and all labels must be that result type.

AFAICT, my opinion is actually what the reference interpreter does:

;; a.wat
(module (func
  (block (result f64)
    (block (result f32)
      (unreachable)
      (br_table 0 1)))
  (drop)))
$ wasm -d a.wat
a.wat:2.3-5.23: invalid module: type mismatch: instruction requires [f64] but stack has [f32]

I would suggest to fix both the select and br_table validation rules to use value types (actually all rules should use value types, see below). The sub-typing is anyway done in the validation rule of sequence of instructions, so that individual instructions don't need to deal with sub-typing.

I would even go slightly further and say that the bottom type should be removed and stack type should just be function type. The stack polymorphic definition is already taking into account the possibility to support arbitrary value type, there's no need for a notion of sub-typing which doesn't exist at execution anyway.

Curious to know what the design direction is regarding those issues.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 19, 2022

Actually, another weirdness I've found...

Oof. Good catch!

I think it should be that the stack matches a result type and all labels must be that result type.

This would mean that, with the GC proposal, an implementation would need to calculate LUBs in dead code in cases where two label types might share a common supertype. We're specifically avoiding LUB calculation in the type checking algorithm so this solution probably won't be possible.

I would even go slightly further and say that the bottom type should be removed and stack type should just be function type. The stack polymorphic definition is already taking into account the possibility to support arbitrary value type, there's no need for a notion of sub-typing which doesn't exist at execution anyway.

Curious to know what the design direction is regarding those issues.

The current design was motivated by a desire to avoid any meaningful constraint tracking in the concrete type checking algorithm. Prior to reference types, it was expected that implementations would need to have value types plus a special any/unknown polymorphic symbol in order to conduct type checking (specifically to handle cases with select e.g. (unreachable) (select) (i32.add)). You can see this in the type checking algorithm suggested by the 1.0 spec. Whether or not this was originally a design mistake, we unfortunately have to remain backwards-compatible with this choice.

When adding reference types, we didn't want to inadvertently impose the requirement that implementations would now need to additionally track constraints on this symbol like α ∈ numtype. So adding bottom to the type system explicitly was the solution.

My (purely personal) opinion that our current handling of dead code has reached a level of conceptual complexity such that the spec is slipping up in describing it, and implementations aren't really correctly implementing it anyway. The relaxed dead code validation proposal above is the result of a fair bit of conversation regarding how to simplify the situation in a backwards-compatible way while avoiding a type checking algorithm with constraint tracking or LUB calculation.

EDIT: apologies for the fat-fingered issue closing

@conrad-watt conrad-watt reopened this Aug 19, 2022
@ia0
Copy link
Contributor Author

ia0 commented Aug 19, 2022

This would mean that, with the GC proposal, an implementation would need to calculate LUBs in dead code

Oh, I see that the GC proposal actually introduces sub-typing in execution. This means validation would also need this sub-typing. And this real sub-typing will make things quite complicated with the fake sub-typing of stack and value polymorphic operations. I see how the relaxed dead-code proposal would be a possible prerequisite to the GC proposal to be tractable by avoiding the fake sub-typing of unreachable code.

Whether or not [the special any polymorphic symbol] was originally a design mistake, we unfortunately have to remain backwards-compatible with this choice.

I understand that backward-compatibility is important. But I don't think how validation is defined needs to be backward-compatible. What needs to be backward-compatible is the set of valid modules (it's ok to increase the set but not reduce it). The way this set is defined shouldn't matter since implementations can choose to implement this set membership test the way they want. The actual validation rules are just one way to describe this set.

But I also understand that without a formal specification, it's easy to define different sets inadvertently. It thus feels safer to just not touch too much the validation definition (although proposals will definitely do).

EDIT: apologies for the fat-fingered issue closing

No worries :-)

@conrad-watt
Copy link
Contributor

I understand that backward-compatibility is important. But I don't think how validation is defined needs to be backward-compatible. What needs to be backward-compatible is the set of valid modules (it's ok to increase the set but not reduce it). The way this set is defined shouldn't matter since implementations can choose to implement this set membership test the way they want. The actual validation rules are just one way to describe this set.

Yes, apologies for any imprecision. I meant the notion of backwards-compatible that you're describing!

@rossberg
Copy link
Member

As @conrad-watt says, this test is intended. The underlying relaxation and the introduction of the bottom type was the smallest possible change that avoided the need for computing lubs/glbs during validation as well as the need for unification variables, neither of which would be desirable for Wasm.

The same reasoning led to the current relaxed rule for br_table and the use of subtyping premises outside sequencing.

There is a loooong history to the whole question of dead code validation. Wrt the current semantics, some of its original motivation is explained here.

@conrad-watt:

I think I might see a drafting error in the current spec though: the result type of unreachable is stated to be an arbitrary sequence of "value types" (with this wording used in a few other places for syntactically dead code), but I think it should be an arbitrary sequence of opdtype. @rossberg, thoughts?

Yes, that's an oversight. The notion of operand type was a temporary last-minute hack to fix fall-out from the late removal (deferral) of subtyping from the reftype proposal. So there are likely places I overlooked. Fortunately, that hack is going away again with the funcref proposal. :)

@ia0
Copy link
Contributor Author

ia0 commented Aug 23, 2022

Thanks for the link to the rationale and the funcref proposal. Let me try to recap to see if I understand correctly the situation:

  • There is no problem with (select) (ref.is_null). This is a well-typed sequence of instructions (even though it can only be well-typed in unreachable code) and as such doesn't break "decomposability" as defined by the rationale (i.e. considering only validity regardless of reachability).
  • There is a problem with the reference interpreter rejecting (block (result f64) (block (result f32) (unreachable) (br_table 0 1))). EDIT: There is actually no problem here, see the comments below.
  • There is a problem with the spec. It should be fixed with the following (at least):
    • stack-polymorphic is defined with "stack type" instead of "function type"
    • unreachable is defined with "operand type" instead of "value type"

Feel free to close this PR (unless my understanding above is wrong).

@rossberg
Copy link
Member

rossberg commented Aug 23, 2022

Almost. The interpreter is actually correct to reject the snippet (block (result f64) (block (result f32) (unreachable) (br_table 0 1))). The reason is unrelated to unreachable and br_table: the body of the outer block ends in an instruction producing a f32 (namely, the inner block), although an f64 is required. If you fix that, then the interpreter correctly accepts the code, e.g.: (block (result f64) (block (result f32) (unreachable) (br_table 0 1)) (drop) (f64.const 0)).

@ia0
Copy link
Contributor Author

ia0 commented Aug 23, 2022

Oh I see, my bad. Thanks for the explanation! I'll edit my previous comment.

@rossberg
Copy link
Member

See #1524 for a fix to the editorial issues.

@rossberg
Copy link
Member

I'll close this PR then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants